Skip to main content
Version: 1.0.16

CREATE AGGREGATE

CREATE AGGREGATE — Define a new aggregate function

Synopsis

CREATE [ OR REPLACE ] AGGREGATE name ( [ argmode ] [ argname ] arg_data_type

[ , ... ] ) (

SFUNC = sfunc,

STYPE = state_data_type

[ , SSPACE = state_data_size ]

[ , FINALFUNC = ffunc ]

[ , FINALFUNC_EXTRA ]

[ , FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]

[ , COMBINEFUNC = combinefunc ]

[ , SERIALFUNC = serialfunc ]

[ , DESERIALFUNC = deserialfunc ]

[ , INITCOND = initial_condition ]

[ , MSFUNC = msfunc ]

[ , MINVFUNC = minvfunc ]

[ , MSTYPE = mstate_data_type ]

[ , MSSPACE = mstate_data_size ]

[ , MFINALFUNC = mffunc ]

[ , MFINALFUNC_EXTRA ]

[ , MINITCOND = minitial_condition ]

[ , SORTOP = sort_operator ]

[ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]

)

CREATE [ OR REPLACE ] AGGREGATE name ( [ [ argmode ] [ argname ] arg_data_type

[ , ... ] ]

ORDER BY [ argmode ] [ argname ] arg_data_type

[ , ... ] ) (

SFUNC = sfunc,

STYPE = state_data_type

[ , SSPACE = state_data_size ]

[ , FINALFUNC = ffunc ]

[ , FINALFUNC_EXTRA ]

[ , FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]

[ , INITCOND = initial_condition ]

[ , HYPOTHETICAL ]

[ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]

)

or the old syntax

CREATE [ OR REPLACE ] AGGREGATE name (

BASETYPE = base_type,

SFUNC = sfunc,

STYPE = state_data_type

[ , SSPACE = state_data_size ]

[ , FINALFUNC = ffunc ]

[ , FINALFUNC_EXTRA ]

[ , FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]

[ , COMBINEFUNC = combinefunc ]

[ , SERIALFUNC = serialfunc ]

[ , DESERIALFUNC = deserialfunc ]

[ , INITCOND = initial_condition ]

[ , MSFUNC = msfunc ]

[ , MINVFUNC = minvfunc ]

[ , MSTYPE = mstate_data_type ]

[ , MSSPACE = mstate_data_size ]

[ , MFINALFUNC = mffunc ]

[ , MFINALFUNC_EXTRA ]

[ , MFINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE } ]

[ , MINITCOND = minitial_condition ]

[ , SORTOP = sort_operator ]

)

Description

CREATE AGGREGATE defines a new aggregate function. CREATE OR REPLACE AGGREGATE will either define a new aggregate function or replace an existing definition. Several basic and commonly used aggregate functions are already included in the release. If you are defining a new type or need an aggregate function that has not been provided, CREATE AGGREGATE can be used to provide the desired feature.

When replacing an existing definition, the parameter types, result type, and number of direct parameters may not be changed. Additionally, the type of the new definition (plain aggregate, ordered-set aggregate, or hypothetical-set aggregate) must be the same as the old definition.

If a schema name is given (e.g., CREATE AGGREGATE myschema.myagg ...), the aggregate will be created in the specified schema. Otherwise, it will be created in the current schema.

An aggregate function is identified by its name and input data types. Two aggregates in the same schema can have the same name, as long as they operate on different input types. The name and input data types of an aggregate must be distinct from every ordinary function in the same schema. This behavior is identical to ordinary function name overloading (see CREATE FUNCTION).

A plain aggregate function consists of one or more ordinary functions: a state transition function sfunc and an optional final calculation function ffunc. They are used as follows:

sfunc( internal-state, next-data-values ) ---> next-internal-state

ffunc( internal-state ) ---> aggregate-value

A temporary variable of data type stype is created to hold the current internal state of the aggregate. For each input row, the aggregate argument values are computed and the state transition function is called, which computes a new internal state value using the current state value and the new argument values. After all rows have been processed, the final function is called once to compute the aggregate's return value. If there is no final function, the ending state value is returned.

An aggregate function can provide an initial condition, that is, an initial value for the internal state value. It is specified as a value of type text and stored in the database, but it must be a valid external representation of a constant of the state value data type. If not provided, the state value starts out as null. If the state transition function is declared "strict", it cannot be called with null inputs. With such a transition function, the aggregate behaves as follows. Rows with any null values are ignored (the function is not called and the previous state value is retained). If the initial state value is null, then when the first row without null values is encountered, the state value is replaced by the first argument value, and the transition function is called for each subsequent row without null values. This is convenient for implementing aggregates like max. Note that this behavior is only available when state_data_type is the same as the first arg_data_type. When these types are different, you must provide a non-null initial condition or use a non-strict transition function.

If the state transition function is not strict, it will be called for every input row, and it must handle null inputs and null state values on its own. This allows the aggregate author full control over how the aggregate handles null values.

If the final function is declared "strict", it will not be called when the final state value is null, and a null result will be returned automatically (this is, of course, the normal behavior of strict functions). In any case, the final function can return a null value. For example, the final function for avg returns null when it sees zero input rows.

Sometimes it is useful to declare the final function to take not only the state value but also extra parameters corresponding to the aggregate's input values. The main reason for this is that if the final function is polymorphic, the state value's data type is unsuitable for determining the result type. These extra parameters are always passed as NULL (so the final function cannot be strict when using the FINALFUNC_EXTRA option), but they are nonetheless valid parameters. For example, the final function can use get_fn_expr_argtype to identify the actual argument types in the current call.

This requires specifying MSFUNC, MINVFUNC, and MSTYPE parameters, and the parameters MSSPACE, MFINALFUNC, MFINALFUNC_EXTRA, and MINITCOND are optional. Except for MINVFUNC, these parameters work similarly to the corresponding simple aggregate parameters without the M prefix; they define a separate implementation of an aggregate that includes an inverse transition function.

The syntax with ORDER BY in the parameter list creates a special aggregate type known as an ordered-set aggregate. If HYPOTHETICAL is specified, a hypothetical-set aggregate is created. These aggregates operate on sorted values in a sorting-dependent manner, so specifying an input sort order is an important part of the calling procedure. Also, they can have direct parameters, which are evaluated only once per aggregate computation, rather than once per input row. Hypothetical-set aggregates are a sub-class of ordered-set aggregates where some direct parameters are required to match the aggregated parameter columns in number and type. This allows the values of these direct parameters to be added to the set of aggregate input rows as an additional "hypothetical" row.

An aggregate can support partial aggregation. This requires specifying the COMBINEFUNC parameter. If state_data_type is internal, SERIALFUNC and DESERIALFUNC parameters can usually also be provided to enable parallel aggregation. Note that the aggregate must also be marked as PARALLEL SAFE to enable parallel aggregation.

Aggregates that behave like MIN or MAX can sometimes be optimized by directly examining an index rather than scanning every input row. If this aggregate can be optimized this way, indicate this by specifying a sort operator. The basic requirement is that the aggregate must produce the first element in the sort order produced by the operator, in other words:

SELECT agg(col) FROM tab;

must be equivalent to:

SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;

A further assumption is that the aggregate ignores null inputs and returns a null result if and only if there are no non-null inputs. Typically, the < operator for a data type is the appropriate sort operator for MIN, and > is the appropriate sort operator for MAX. Note that the optimization will never take effect unless the specified operator is a "less than" or "greater than" strategy member of a B-tree index operator class.

To be able to create an aggregate function, you must have USAGE privilege on the parameter types, state type, and return type, as well as EXECUTE privilege on the supporting functions.

Parameters

name

The name of the aggregate function to be created (can be schema-qualified).

argmode

The mode of an argument: IN or VARIADIC (aggregate functions do not support OUT arguments). If omitted, the default is IN. Only the last argument can be marked as VARIADIC.

argname

The name of an argument. Currently, this is only used for documentation purposes. If omitted, the argument has no name.

arg_data_type

An input data type that this aggregate function operates on. To create a zero-argument aggregate function, write * in place of the argument specification list (an example of such an aggregate is count(*)).

base_type

In the old syntax of CREATE AGGREGATE, the input data type is specified by a basetype parameter rather than written after the aggregate name. Note that this syntax only allows one input parameter. To define a zero-argument aggregate function with this syntax, specify basetype as "ANY" (not *). Ordered-set aggregates cannot be defined with the old syntax.

sfunc

The name of the state transition function to be called for each input row. For a normal N-argument aggregate function, sfunc must take N+1 arguments, where the first argument is of type state_data_type and the remaining arguments match the aggregate's declared input data types. The function must return a value of type state_data_type. This function takes the current state value and the current input data values and returns the next state value.

For ordered-set (including hypothetical-set) aggregates, the state transition function takes only the current state value and the aggregate arguments, but not the direct parameters. Otherwise, it works the same as other transition functions.

state_data_type

The data type for the aggregate's state value.

state_data_size

The approximate average size (in bytes) of the aggregate's state value. If this parameter is omitted or is zero, a default estimate based on state_data_type will be used. The planner uses this value to estimate the memory required for a grouped aggregate query.

ffunc

The name of the final function that is called after all input rows have been traversed to compute the aggregate result. For a regular aggregate, this function must accept only a single argument of type state_data_type. The aggregate's return data type is defined as the return type of this function. If ffunc is not specified, the ending state value is used as the aggregate result, and the return type is state_data_type.

For ordered-set (including hypothetical-set) aggregates, the final function receives not only the final state value but also the values of all direct parameters.

If FINALFUNC_EXTRA is specified, the final function receives additional NULL values corresponding to the aggregate's regular (aggregated) arguments, in addition to the final state value and any direct parameters. This is primarily used to allow correct determination of the aggregate's result type when defining a polymorphic aggregate.

FINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE }

This option specifies whether the final function is a pure function that does not modify its arguments. READ_ONLY indicates it does not modify; the other two values indicate it may change the transition state value. See Notes for more details. The default is READ_ONLY, except for ordered-set aggregates which use READ_WRITE as the default.

combinefunc

The combinefunc function can be optionally specified to allow the aggregate function to support partial aggregation. If provided, combinefunc must combine two state_data_type values, each containing the aggregate result over a subset of input values, and produce a new state_data_type representing the aggregate result over both input sets combined. This function can be thought of as an sfunc that, instead of operating on an individual input row and adding it to the running aggregate state, adds another aggregate state to the running state.

combinefunc must be declared with two state_data_type parameters and must return a state_data_type value. This function can optionally be marked as "strict". When so marked, the function will not be called if either input state is null; instead, the other state will be taken as the correct result.

For aggregate functions where state_data_type is internal, combinefunc cannot be strict. In this case, combinefunc must ensure correct handling of null states and that the returned state is properly stored in the aggregate memory context.

serialfunc

An aggregate function with state_data_type internal can participate in parallel aggregation if and only if it has a serialfunc function that serializes the aggregate state into a bytea value for transmission to another process. This function must have a single internal type parameter and return type bytea. A corresponding deserialfunc is also required.

deserialfunc

Deserializes a previously serialized aggregate state back into state_data_type. This function must have two parameters of types bytea and internal, and produce a result of type internal (Note: the second parameter of type internal is unused, but is required for type safety reasons).

initial_condition

The initial setting for the state value. This must be a string constant in a form acceptable to the data type state_data_type. If not specified, the state value starts out as null.

msfunc

The name of the forward state transition function, called for each input row in moving-aggregate mode. It is very similar to the regular transition function, but its first argument and result type are mstate_data_type, which may differ from state_data_type.

minvfunc

The name of the inverse state transition function used in moving-aggregate mode. This function has the same parameter and result types as msfunc, but it is used to remove a value from the current aggregate state rather than adding one to it.

The inverse transition function must have the same strictness property as the forward state transition function.

mstate_data_type

The data type for the aggregate state value when using moving-aggregate mode.

mstate_data_size

The approximate average size (in bytes) of the aggregate state value when using moving-aggregate mode. It serves the same purpose as state_data_size.

mffunc

The name of the final function used in moving-aggregate mode, called after all input rows have been traversed to compute the aggregate result. It works the same as ffunc, but its first argument type is mstate_data_type, and extra null parameters are specified by writing MFINALFUNC_EXTRA. The aggregate result type determined by mffunc or mstate_data_type must match the type determined by the aggregate's regular implementation.

MFINALFUNC_MODIFY = { READ_ONLY | SHAREABLE | READ_WRITE }

This option is similar to FINALFUNC_MODIFY, except that it describes the behavior of the moving-aggregate final function.

minitial_condition

The initial setting for the state value when using moving-aggregate mode. It serves the same purpose as initial_condition.

sort_operator

The associated sort operator for a MIN- or MAX-like aggregate. This is just an operator name (may be schema-qualified). The operator is assumed to have the same input data type as the aggregate (which must be a single-argument plain aggregate).

PARALLEL = { SAFE | RESTRICTED | UNSAFE }

PARALLEL SAFE, PARALLEL RESTRICTED, and PARALLEL UNSAFE have the same meanings as in CREATE FUNCTION. If an aggregate is marked as PARALLEL UNSAFE (default) or PARALLEL RESTRICTED, it will not be considered for parallelization. Note that the planner does not consult the parallel safety markings of the aggregate's supporting functions; it only considers such markings on the aggregate itself.

HYPOTHETICAL

Only for ordered-set aggregates, this flag specifies that the aggregate arguments are to be processed according to the requirements of hypothetical-set aggregates: that is, the trailing direct parameters must match the data types of the aggregated (WITHIN GROUP) parameters. The HYPOTHETICAL flag has no effect at runtime; it is only useful during command parsing for determining data types and collation of aggregate parameters.

The parameters of CREATE AGGREGATE can be written in any order, without needing to follow the order described above.

Notes

In parameters that specify supporting function names, you can write a schema name if needed, e.g., SFUNC = public.sum. You cannot write argument types here — the argument types of the supporting functions are determined from other parameters.

Typically, functions are pure functions that do not modify their input values. However, an aggregate transition function is allowed to cheat and modify the transition state argument in-place when used in the aggregate context, because this can provide substantial performance improvements compared to creating a new copy of the transition state each time.

Similarly, while aggregate final functions are generally not expected to modify their input values, sometimes it is impractical to avoid modifying the transition state argument. This behavior must be declared using the FINALFUNC_MODIFY parameter. The READ_WRITE value indicates that the final function modifies the transition state value in some unspecified way. This value prevents the aggregate from being used as a window function and also prevents merging of transition states from aggregate calls that share the same input values and transition function. The SHAREABLE value indicates that the transition function cannot be used after the final function, but multiple final function calls can be made on the final transition state value. This value prevents the aggregate from being used as a window function but allows merging of transition states. (That is, the optimization of concern here is not repeatedly using the same final function, but applying different final functions to the same final transition state value. This is allowed as long as all final functions are not marked as READ_WRITE.)

If an aggregate supports moving-aggregate mode, it will improve computational efficiency when the aggregate is used as a window function with a moving frame start (i.e., the frame start mode is not UNBOUNDED PRECEDING). Conceptually, the forward transition function adds input values to the aggregate state as they enter the window frame from the bottom, while the inverse transition function removes input values as they leave the frame from the top. Therefore, when values are removed, they are always removed in the same order they were added. Whenever the inverse transition function is called, it will therefore receive the most recently added but not yet removed argument value. The inverse transition function can assume that at least one row remains in the current state after it removes the oldest row (when this is not the case, the window function mechanism simply starts a new aggregate computation rather than using the inverse transition function).

The forward transition function for moving-aggregate mode is not allowed to return NULL as the new state value. If the inverse transition function returns NULL, this indicates that the inverse function cannot reverse the state computation for this particular input, and the aggregate computation must therefore be recalculated "from scratch" starting from the beginning of the current frame. In some rare cases, reversing the state computation in progress is impractical; this convention allows the use of moving-aggregate mode in such situations.

If no moving-aggregate implementation is provided, the aggregate can still be used with moving frames, but the system will recalculate the entire aggregate whenever the frame start moves. Note that regardless of whether the aggregate supports moving-aggregate mode, the system can handle a moving frame end without recalculation, simply by adding new values to the aggregate state. This is why using an aggregate as a window function requires the final function to be read-only. The final function is expected not to destroy the aggregate's state value, so that the aggregate can continue even after an aggregate result value has been obtained for one set of frame boundaries.

The syntax for ordered-set aggregates allows specifying VARIADIC for the last direct parameter and the last aggregated (WITHIN GROUP) parameter. However, the current implementation restricts the use of VARIADIC in two ways. First, ordered-set aggregates can only use VARIADIC "any", not other variadic array types. Second, if the last direct parameter is VARIADIC "any", then there can only be one aggregated parameter and it must also be VARIADIC "any" (in the representation used in the system catalog, these two parameters are merged into a single VARIADIC "any" item, because pg_proc cannot represent functions with more than one VARIADIC parameter). If the aggregate is a hypothetical-set aggregate, the direct parameters matching the VARIADIC "any" parameter are the hypothetical parameters. Any preceding parameters represent additional direct parameters that are not constrained to match the aggregated parameters.

Currently, ordered-set aggregates do not need to support moving-aggregate mode, because they cannot be used as window functions.

Partial (including parallel) aggregation is currently not supported by ordered-set aggregates. Also, aggregate calls that include DISTINCT or ORDER BY clauses will not use partial aggregation, because those semantics cannot be supported during partial aggregation.

See Also

ALTER AGGREGATE, DROP AGGREGATE